In-RDBMS Hardware Acceleration of Advanced Analytics
نویسندگان
چکیده
The data revolution is fueled by advances in several areas, including databases, high-performance computer architecture, and machine learning. Although timely, there is a void of solutions that brings these disjoint directions together. This paper sets out to be the initial step towards such a union. The aim is to devise a solution for the in-Database Acceleration of Advanced Analytics (DAnA). DAnA empowers database users to leap beyond traditional data summarization techniques and seamlessly utilize hardware-accelerated machine learning. Deploying specialized hardware, such as FPGAs, for in-database analytics currently requires hand-designing the hardware and manually routing the data. Instead, DAnA automatically maps a high-level specification of indatabase analytics queries to the FPGA accelerator. The accelerator implementation is generated from a User Defined Function (UDF), expressed as part of a SQL query in a Python-embedded Domain-Specific Language (DSL). To realize efficient in-database integration, DAnA-generated accelerators contain a novel hardware structure, Striders, that directly interface with the buffer pool of the database. DAnA obtains the schema and page layout information from the database catalog to configure the Striders. In turn, Striders extract, cleanse, and process the training data tuples, which are consumed by a multi-threaded FPGA engine that executes the analytics algorithm. We integrated DAnA with PostgreSQL to generate hardware accelerators for a range of real-world and synthetic datasets running diverse ML algorithms. Results show that DAnA-enhanced PostgreSQL provides, on average, 11.3× endto-end speedup for real datasets, with the maximum at 58.2×. Moreover, DAnA-enhanced PostgreSQL is 5.4× faster, on average, than the multi-threaded Apache MADLib running on Greenplum. DAnA provides these benefits while hiding the complexity of hardware design from data scientists and allowing them to express the algorithm in ≈30-60 lines of Python code.
منابع مشابه
Application of Big Data Analytics in Power Distribution Network
Smart grid enhances optimization in generation, distribution and consumption of the electricity by integrating information and communication technologies into the grid. Today, utilities are moving towards smart grid applications, most common one being deployment of smart meters in advanced metering infrastructure, and the first technical challenge they face is the huge volume of data generated ...
متن کاملA Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection
Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....
متن کاملAdvanced Electron Linacs
The research into advanced acceleration concepts for electron linear accelerators being pursued at SLAC is reviewed. This research includes experiments in laser acceleration, plasma wakefield acceleration, and mmwavelength RF driven accelerators.
متن کاملComparison of Data Warehousing DBMS Platforms
Although relational databases (RDBMS) are the most common choice for data warehouse implementations, their record-based structure is far from ideal. As data volumes grow and users demand more sophisticated analytical capabilities, the deficiencies of the RDBMS to data storage become more conspicuous. RDBMS data warehouse systems are difficult to design; extremely inefficient in their use of dis...
متن کاملOn Current Strategies for Hardware Acceleration of Digital Image Restoration Filters
Two advanced design methodologies for hardware acceleration of a standard digital image restoration algorithm are explored and compared. The first one is the custom-designed hardware approach, leading to an application-specific integrated circuit (ASIC) implementation. The second one consists of the configurable processor approach, yielding a mixed hardware/software implementation running on a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1801.06027 شماره
صفحات -
تاریخ انتشار 2018